An Empirical Study on Improving Hierarchical Phrase-Based Translation Using Alignment Features

نویسندگان

  • Songfang Huang
  • Bowen Zhou
چکیده

In this paper, we empirically investigate three new features from word alignments to improve speech-to-speech translation on mobile devices for low-resource languages. The three features include one feature about alignment for boundary words of the target side phrase, one about the balance of terminal words between the source and the target side, and another about the number of unaligned words. We carry out experiments on both directions (E2F and F2E) for Pashto and Dari, two official languages of Afghanistan. By using the proposed alignment features, we can obtain improvements (up to 1% BLEU score) on the test sets for both Pashto and Dari.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending a probabilistic phrase alignment approach for SMT

Phrase alignment is a crucial step in phrase-based statistical machine translation. We explore a way of improving phrase alignment by adding syntactic information in the form of chunks as soft constraints guided by an in-depth and detailed analysis on a hand-aligned data set. We extend a probabilistic phrase alignment model that extracts phrase pairs by optimizing phrase pair boundaries over th...

متن کامل

Statistical Machine Translation Based on Hierarchical Phrase Alignment

This paper describes statistical machine translation improved by applying hierarchical phrase alignment. The hierarchical phrase alignment is a method to align bilingual sentences phrase-by-phrase employing the partial parse results. Based on the hierarchical phrase alignment, a translation model is trained on a chunked corpus by converting hierarchically aligned phrases into a sequence of chun...

متن کامل

Hierarchical Translation Equivalence over Word Alignments

We present a theory of word alignments in machine translation (MT) that equips every word alignment with a hierarchical representation with exact semantics defined over the translation equivalence relations known as hierarchical phrase pairs. The hierarchical representation consists of a set of synchronous trees (called Hierarchical Alignment Trees – HATs), each specifying a bilingual compositi...

متن کامل

Improving Statistical Word Alignment with Various Clues

This paper proposes a method to improve word alignment by combining various clues. Our method first trains a baseline statistical IBM word alignment model. Then we improve it with various clues, which are mainly based on features such as lemmatization, translation dictionary, named entities, and chunks. We incorporate these features into an unified framework. Experimental results show that our ...

متن کامل

Lexicon models for hierarchical phrase-based machine translation

In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 lexicon [2]; a regularized version of IBM model 1; a triplet lexicon model variant [3]; and a disc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011